{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    }
   },
   "source": [
    "Hyper Parameter Optimization, aka HPO, is to find a well-performed hyper-parameter on a given search space. The most well-known HPO is grid-search but it only performs well on tiny search space. To resolve hpo on large search space, a lot of algorithms are applied. For example, bayesian optimization is designed for optimizing expensive, black box functions which is very suitable for hpo task. Cost-Frugal optimization on the other hand, taking the training cost into consideration and is aimed to find a better solution within limited cost.\n",
    "\n",
    "One thing to note is even though hpo is a very activate research field and a lot of algorithms have been invented in the last few years, there's still lacking a general, all-in-one hpo alogrithm that performs well on all datasets. So the best way to find out the right hpo algorithm is always try different hpos on your dataset.\n",
    "\n",
    "AutoML.Net provides several hpos for you to try out, and you can configure and replace different hpos easily for `AutoMLExperiment` via setting different tuner. In this notebook, we'll go through the following topics.\n",
    "- Available tuners in AutoML.Net, and how to use it.\n",
    "- Comparing the performance for those tuners."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "// install dependencies and import using statement\n",
    "#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
    "#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
    "#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
    "\n",
    "// make sure you are using Microsoft.ML.AutoML later than 0.20.0.\n",
    "#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22514.1\"\n",
    "#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22514.1\"\n",
    "// Import usings.\n",
    "using System;\n",
    "using System.IO;\n",
    "using System.Net;\n",
    "using Microsoft.ML;\n",
    "using Microsoft.ML.AutoML;\n",
    "using Microsoft.ML.Data;\n",
    "using Microsoft.ML.SearchSpace;\n",
    "using Newtonsoft.Json;\n",
    "using Microsoft.ML.AutoML.CodeGen;\n",
    "using Microsoft.Data.Analysis;\n",
    "using Microsoft.ML.SearchSpace.Option;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Available Tuners in AutoML.Net\n",
    "For now, those tuners are available in AutoML.Net\n",
    "- CostFrugalTuner: low-cost HPO algorithm, this is an implementation of [Frugal Optimization for Cost-related Hyperparameters](https://arxiv.org/abs/2005.01571).\n",
    "- SMAC: Bayesian optimziation using random forest as regression model.\n",
    "- EciCostFrugalTuner: CostFrugalTuner for hierarchical search space. This will be used as default tuner if `AutoMLExperiment.SetPipeline` get called.\n",
    "- GridSearch\n",
    "- RandomSearch\n",
    "\n",
    "The following section shows how to use different tuner in `AutoMLExperiment`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var context = new MLContext(1);\n",
    "var experiment = context.Auto().CreateExperiment();\n",
    "\n",
    "// use EciCostFrugalTuner\n",
    "// Note: EciCostFrugalTuner will be set as default tuner if you call \n",
    "// experiment.SetPipeline()\n",
    "experiment.SetEciCostFrugalTuner();\n",
    "\n",
    "// use CostFrugalTuner\n",
    "experiment.SetCostFrugalTuner();\n",
    "\n",
    "// use SMAC\n",
    "experiment.SetSmacTuner();\n",
    "\n",
    "// use GridSearch\n",
    "experiment.SetGridSearchTuner(step: 10);\n",
    "\n",
    "// use RandomSearch\n",
    "experiment.SetRandomSearchTuner(seed: 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare GridSearch and EciCostFrugal on titanic dataset\n",
    "\n",
    "The following section shows how different hpo effect automl performance, by comparing metric trend from GridSearch and EciCostFrugal on titanic dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    }
   },
   "source": [
    "## Download titanic if necessary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "string EnsureDataSetDownloaded(string fileName)\n",
    "{\n",
    "\n",
    "\t// This is the path if the repo has been checked out.\n",
    "\tvar filePath = Path.Combine(Directory.GetCurrentDirectory(),\"data\", fileName);\n",
    "\n",
    "\tif (!File.Exists(filePath))\n",
    "\t{\n",
    "\t\t// This is the path if the file has already been downloaded.\n",
    "\t\tfilePath = Path.Combine(Directory.GetCurrentDirectory(), fileName);\n",
    "\t}\n",
    "\n",
    "\tif (!File.Exists(filePath))\n",
    "\t{\n",
    "\t\tusing (var client = new WebClient())\n",
    "\t\t{\n",
    "\t\t\tclient.DownloadFile($\"https://raw.githubusercontent.com/dotnet/csharp-notebooks/main/machine-learning/data/{fileName}\", filePath);\n",
    "\t\t}\n",
    "\t\tConsole.WriteLine($\"Downloaded {fileName}  to : {filePath}\");\n",
    "\t}\n",
    "\telse\n",
    "\t{\n",
    "\t\tConsole.WriteLine($\"{fileName} found here: {filePath}\");\n",
    "\t}\n",
    "\n",
    "\treturn filePath;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var trainDataPath = EnsureDataSetDownloaded(\"titanic-train.csv\");\n",
    "var df = DataFrame.LoadCsv(trainDataPath);\n",
    "\n",
    "var trainTestSplit = context.Data.TrainTestSplit(df, 0.1);\n",
    "df.Head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Construct pipeline and AutoMLExperiment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "var pipeline = context.Auto().Featurizer(df, excludeColumns: new[]{\"Survived\"})\n",
    "                        .Append(context.Transforms.Conversion.ConvertType(\"Survived\", \"Survived\", DataKind.Boolean))\n",
    "\t\t\t\t\t    .Append(context.Auto().BinaryClassification(labelColumnName: \"Survived\"));\n",
    "// Configure AutoML\n",
    "var monitor = new NotebookMonitor(pipeline);\n",
    "\n",
    "var experiment = context.Auto().CreateExperiment()\n",
    "                    .SetPipeline(pipeline)\n",
    "                    .SetTrainingTimeInSeconds(10)\n",
    "                    .SetDataset(trainTestSplit.TrainSet, trainTestSplit.TestSet)\n",
    "                    .SetBinaryClassificationMetric(BinaryClassificationMetric.Accuracy, \"Survived\", \"PredictedLabel\")\n",
    "                    .SetMonitor(monitor);\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run HPO using GridSearch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "experiment.SetGridSearchTuner(step: 10);\n",
    "await experiment.RunAsync();\n",
    "var gridSearchTrial = monitor.CompletedTrials.ToArray();\n",
    "monitor.CompletedTrials.Clear();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run HPO using EciCostFrugal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "experiment.SetEciCostFrugalTuner();\n",
    "await experiment.RunAsync();\n",
    "var eciSearchTrials = monitor.CompletedTrials.ToArray();\n",
    "monitor.CompletedTrials.Clear();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare HPO performace among GridSearch, EciCostFrugal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "vscode": {
     "languageId": "dotnet-interactive.csharp"
    }
   },
   "outputs": [],
   "source": [
    "using Plotly.NET;\n",
    "\n",
    "var gridSearchChart = Chart2D.Chart.Line<int, float, string>(gridSearchTrial.Select(t => t.TrialSettings.TrialId), gridSearchTrial.Select(t => (float)t.Metric), Name: \"grid_search\");\n",
    "var eciCfoSearchChart = Chart2D.Chart.Line<int, float, string>(eciSearchTrials.Select(t => t.TrialSettings.TrialId), eciSearchTrials.Select(t => (float)t.Metric), Name: \"eci_cfo\");\n",
    "var combineChart = Chart.Combine(new[]{ gridSearchChart, eciCfoSearchChart});\n",
    "combineChart.Display()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Continue learning\n",
    "\n",
    "> [⏪ Last Module - AutoML Tuner](./06-AutoML%20HPO%20and%20tuner.ipynb)\n",
    "\n",
    "## See also\n",
    "\n",
    "- [AutoML SearchSpace](./Parameter%20and%20SearchSpace.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".NET (C#)",
   "language": "C#",
   "name": ".net-csharp"
  },
  "language_info": {
   "file_extension": ".cs",
   "mimetype": "text/x-csharp",
   "name": "C#",
   "pygments_lexer": "csharp",
   "version": "9.0"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
